PyTorch tutorial: LSTM

LSTMはRNNの拡張という理解で読み始めた

『ゼロから作るDeep Learning②』を読んだ記憶

In the case of an LSTM, for each element in the sequence, there is a corresponding hidden state h_t, which in principle can contain information from arbitrary points earlier in the sequence.

系列の各要素に対応する隠れ層

隠れ層は原則として、系列における任意の先行する点の情報を含む

（系列の今の要素よりも前の情報が使える）

Pytorch’s LSTM expects all of its inputs to be 3D tensors.

The first axis is the sequence itself, the second indexes instances in the mini-batch, and the third indexes elements of the input.

1st, 2nd axis (dimension) はサイズ1とする

2nd: ミニバッチは考えない（？）

1st: 一度に1系列進める

code:python

>> import torch

>> import torch.nn as nn

>> torch.manual_seed(1)

>> lstm = nn.LSTM(3, 3)

>> inputs = torch.randn(1, 3) for _ in range(5) # 型はリスト

>> hidden = (torch.randn(1, 1, 3), torch.randn(1, 1, 3))

>> for i in inputs:

... print(i)

... out, hidden = lstm(i.view(1, 1, -1), hidden)

... print(out)

... print(hidden)

...

tensor(-0.5525, 0.6355, -0.3968)

tensor(-0.0187, 0.1713, -0.2944, grad_fn=<StackBackward0>)

(tensor(-0.0187, 0.1713, -0.2944, grad_fn=<StackBackward0>), tensor(-0.0912, 0.5568, -0.7891, grad_fn=<StackBackward0>))

tensor(-0.6571, -1.6428, 0.9803)

tensor(-0.3521, 0.1026, -0.2971, grad_fn=<StackBackward0>)

(tensor(-0.3521, 0.1026, -0.2971, grad_fn=<StackBackward0>), tensor(-0.5770, 0.3543, -0.4774, grad_fn=<StackBackward0>))

tensor(-0.0421, -0.8206, 0.3133)

tensor(-0.3191, 0.0781, -0.1957, grad_fn=<StackBackward0>)

(tensor(-0.3191, 0.0781, -0.1957, grad_fn=<StackBackward0>), tensor(-0.7148, 0.2161, -0.3669, grad_fn=<StackBackward0>))

tensor(-1.1352, 0.3773, -0.2824)

tensor(-0.1634, 0.0941, -0.1637, grad_fn=<StackBackward0>)

(tensor(-0.1634, 0.0941, -0.1637, grad_fn=<StackBackward0>), tensor(-0.8164, 0.2303, -0.2959, grad_fn=<StackBackward0>))

tensor(-2.5667, -1.4303, 0.5009)

tensor(-0.3368, 0.0959, -0.0538, grad_fn=<StackBackward0>)

(tensor(-0.3368, 0.0959, -0.0538, grad_fn=<StackBackward0>), tensor(-0.9825, 0.4715, -0.0633, grad_fn=<StackBackward0>))

viewメソッドについては torch.Tensor 参照

inputsをTensorに変えて上と同様の例

code:python

>> torch.cat(inputs).size()

torch.Size(5, 3)

>> inputs = torch.cat(inputs).view(len(inputs), 1, -1) # Add the extra 2nd dimension

>> inputs.size()

torch.Size(5, 1, 3)

>> hidden = (torch.randn(1, 1, 3), torch.randn(1, 1, 3))

>> out, hidden = lstm(inputs, hidden)

>> print(out)

tensor([-0.2945, -0.3090, 0.0366,

-0.5580, -0.1228, 0.0714,

-0.4122, -0.0834, 0.0380,

-0.1954, -0.0010, 0.0192,

-0.3722, 0.0672, 0.1393], grad_fn=<StackBackward0>)

>> print(hidden)

(tensor(-0.3722, 0.0672, 0.1393, grad_fn=<StackBackward0>), tensor(-1.2037, 0.3431, 0.1663, grad_fn=<StackBackward0>))

the first value returned by LSTM is all of the hidden states throughout the sequence.

outはすべてのhidden stateである

the second is just the most recent hidden state (compare the last slice of "out" with "hidden" below, they are the same)

hiddenは最近のhidden stateである

上の例でhiddenのインデックス0はoutと一致

"out" will give you access to all hidden states in the sequence

"hidden" will allow you to continue the sequence and backpropagate, by passing it as an argument to the lstm at a later time